fix(ai): stop discarding exploit mode; wire + document it (D2) by ocervell · Pull Request #1263 · freelabz/secator

ocervell · 2026-07-01T17:09:10Z

Finding — D2: `exploit` mode half-wired (P4 Pertinence)

The selection prompt offers exploit, and the full mode exists
(MODES["exploit"], SYSTEM_EXPLOIT, get_system_prompt branch,
modes/_selection.txt classifies attack/chat/exploit), but detection
threw the classification away and the opt help omitted it.

Root cause

secator/tasks/ai.py:764 (pre-change) — _detect_mode accepted only
("attack", "chat") from the intent LLM; an exploit verdict hit the
else and reverted to old_mode or "chat". Since D4's fast_detect_mode
already defers exploit-ish prompts to the LLM, the LLM could return
exploit — it was just discarded here.
secator/tasks/ai.py:74 — mode opt help hardcoded "Mode: attack or chat".

Changes

_detect_mode: accept any mode in MODES (adds exploit; attack/chat
behavior identical). secator/tasks/ai.py:764
mode opt help derived from MODES.keys() — single source of truth, no
drift (f"Mode: {', '.join(MODES)}"). secator/tasks/ai.py:74
Imported MODES into the task module (DRY; no third hardcoded mode tuple).

No change to the exploit SAFETY posture — exploit still runs through the same
PermissionEngine/guardrails; only detection + docs changed.

Exploit prompt is real

secator/ai/prompts/modes/exploit.txt is a full 43-line template (persona =
"exploitation verification specialist", methodology, add_finding
exploitation-report flow, ${guardrails}/${isolation} constraints).
get_system_prompt("exploit", ...) renders with no leftover ${include} or
$template_var placeholders (D1's $query_types/$output_types_reference
substitution covers exploit too).

Tests

Added focused tests (tests/unit/test_ai_task_opts.py,
tests/unit/test_ai_prompts.py):

LLM _detect_mode verdict of exploit now sets self.mode == "exploit"
(previously fell back to chat).
attack / chat / unknown verdicts unchanged (unknown → chat fallback).
get_system_prompt("exploit") renders clean (no unresolved placeholders).
mode opt help lists every mode incl. exploit.

Baseline vs after (test_ai_loop.py test_ai_session.py test_ai_prompts.py):
baseline 12 failed / 85 passed → after 12 failed / 86 passed. The 12
failures are identical pre-existing env failures (shfmt/safecmd sandbox), not
regressions; the +1 pass is the new exploit-render test.

Related smells (not fixed here)

secator/ai/prompts.py:248 — if mode in ("attack", "exploit"): hardcodes
the "uses library reference" set; a new library-using mode would need a
manual edit. Candidate to derive from mode config.
secator/tasks/ai.py:66 — class docstring still says "(attack or chat
mode)"; omits exploit.
secator/ai/prompts/modes/_selection.txt hardcodes the three mode names in
prose — can drift from MODES if a mode is added/removed.

🤖 Generated with Claude Code

`_detect_mode` accepted only ("attack","chat") from the intent LLM and discarded an "exploit" classification (fell back to old_mode/chat), even though the full exploit mode exists (MODES entry, SYSTEM_EXPLOIT prompt, get_system_prompt branch, _selection.txt classifies it). So exploit mode was unreachable by detection and the `mode` opt help omitted it. - _detect_mode: accept any mode in MODES (incl. exploit); attack/chat unchanged. - `mode` opt help derived from MODES.keys() (DRY, no drift). - Tests: LLM "exploit" verdict now sets mode=exploit; attack/chat/unknown unchanged; exploit system prompt renders with no leftover template vars. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-07-01T17:09:23Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4a481a44-c097-4ea5-84b6-ede5f6f34b62

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/exploit-mode-wiring

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

ocervell merged commit 2a9f25a into ai-resiliency Jul 2, 2026
1 check passed

ocervell deleted the fix/exploit-mode-wiring branch July 2, 2026 15:34

ocervell mentioned this pull request Jul 2, 2026

feat(ai): AI-task reliability, robustness & security hardening (27 fixes) #1241

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(ai): stop discarding exploit mode; wire + document it (D2)#1263

fix(ai): stop discarding exploit mode; wire + document it (D2)#1263
ocervell merged 1 commit into
ai-resiliencyfrom
fix/exploit-mode-wiring

ocervell commented Jul 1, 2026

Uh oh!

coderabbitai Bot commented Jul 1, 2026

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ocervell commented Jul 1, 2026

Finding — D2: exploit mode half-wired (P4 Pertinence)

Root cause

Changes

Exploit prompt is real

Tests

Related smells (not fixed here)

Uh oh!

coderabbitai Bot commented Jul 1, 2026

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Finding — D2: `exploit` mode half-wired (P4 Pertinence)